Speech, Hearing and Language: work in progress Volume 14 A CHOICE THEORY METHOD FOR EVALUATING AUDIOVISUAL PHONEME RECOGNITION

نویسنده

  • Paul IVERSON
چکیده

This article describes a mathematical method, based on Choice Theory (e.g., Luce, 1963), that can be used to predict audiovisual phoneme confusion matrices from unimodal audio and visual data. The predictions made from this method can be compared to obtained levels of audiovisual processing, for the purpose of identifying individuals whose audiovisual integration processes are not efficient. A reanalysis of Grant et al.'s (1998) audiovisual consonant confusion data is presented to evaluate this method. The results demonstrate that this method is effective at predicting audiovisual phoneme recognition responses, and suggests that Grant et. al's. subjects were highly efficient at integrating audiovisual information. Matlab code used in these analyses is available at http://www.phon.ucl.ac.uk/home/paul/CT/home.htm. Introduction Several methods have been developed to predict audiovisual phoneme confusion matrices based on phoneme confusion data collected under unimodal audio and visual stimulus conditions (Blamey, 1989; Braida, 1991; Massaro, 1987). This article describes a new method, based on Choice Theory, which serves the same purpose. Compared to other existing methods, the present method offers at least two advantages. First, this method is designed to estimate audiovisual phoneme recognition under optimal-processing conditions (see Braida, 1991; Grant et al., 1998); it estimates the highest level of audiovisual consonant recognition that is possible given the phonetic information available separately through the auditory and visual modalities. Predictions based on optimal processing assumptions are useful, because they can be compared to obtained levels of performance to help identify individual patients whose cognitive/perceptual processes (e.g., processes that integrate the phonetic information from each modality and map the phonetic information onto long-term memory representations for language) are not making efficient use of the available phonetic information (Grant et al., 1998; Grant & Seitz, 1998). Second, this method is less complex mathematically than the only other proposed method for estimating audiovisual performance under optimal-processing conditions, Braida's pre-labeling model (1991). Braida's pre-labeling model is based on a multidimensional extension of Signal Detection Theory (e.g., Durlach & Braida, 1969; Green & Swets, 1966; Macmillan et al., 1988). It requires fitting the consonants to locations within a multidimensional audiovisual perceptual space, calculating response regions for each consonant within this space, and integrating a multidimensional Gaussian probability function for each consonant over each of these response regions. In contrast, the Choice Theory method used here does not require the consonants to be represented within a multidimensional space (although it is possible to represent Choice Theory coefficients 1 Matlab code used in these analyses is available at http://www.phon.ucl.ac.uk/home/paul/CT/home.htm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

پیش‌بینی قابلیت فهم همخوان‌ها در افراد دارای شنوایی عادی با استفاده از مدل‌های میکروسکوپی دارای معیار فاصله‌ مختلف در بازشناساگر خودکار گفتار

In this study, recognition rates of consonants available in vowel-consonant-vowel structure in hearing tests and two microscopic models will be investigated. Such a syllable structure doesn’t exist in Farsi and Azerbaijani languages, but since the goal is only recognition of middle phoneme, according to hearing tests, listeners are able to properly recognize phonemes in clean speech conditions....

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

توانش های شناخت، نظریه ذهن و حافظه دیداری در کودکان کم شنوا

Hearing problems in children hard of hearing, in addition to communication skills, will effect social interaction too. One aspect of social recognition which has attracted an increasing attention in recent years is the development of children's intelligence theory. In connection with intellectual and recognition abilities in children hard of hearing, intelligence is a subject that has always be...

متن کامل

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003